1 Introduction

In this tutorial, we will cover the basics of making figures in R and provide several examples and exercises to explore a variety of figure types including scatter plots, box plots, histograms, networks, and phylogenetic trees. The tutorial will be taught in two sessions:

  • Session 1: Introduction, Basic Figure Making, Using ‘ggplot2’
  • Session 2: Specialized Figure Making: Making Networks and Phylogenetic Trees

1.1 Learning outcomes

This tutorial provides students with opportunities to learn procedures to:

  • Insert figures into an R Markdown file.
  • Create figures using R’s default graphics.
  • Create, customize, and arrange figures in R using ‘ggplot2’ and associated packages.
  • Create and customize network and phylogenetic tree figures.

1.2 Associated Files, Data, and Packages

The files to support this tutorial are deposited in the shared Google Drive at this path:

  • Reproducible_Science/Bioinformatic_tutorials/Chapter_10

In addition to the data files stored in this path, this tutorial will also use the pressure and iris datasets available through R’s ‘datasets’ package.

To get started, you will need to install the following packages:

#install packages - these are the ones you may not already have installed
install.package(c("ggplotgui", "gridExtra", "grid", "ggpubr","GGally", "network", "sna", "ape", "devtools", "ggtree", "tidyverse"))

You will also need to make sure these packages are installed and loaded to successfully complete the tutorial:

#load packages
library(knitr) #knitting tools
library(datasets) #to access base datasets - there's a lot to choose from in here!
library(readr) #to load other data files (e.g., .csv)
library(ggplot2) #to do some fancier plotting
library(ggplotgui) #to access a shiny app for ggplots
library(gridExtra) #to arrange multiple plots on a page
library(grid) #another way to arrange multiple plots
library(ggpubr) #great package for making publication ready plots
library(GGally) #contains the ggnet2 function we will use to make networks
library(network) #tools to create and modify network objects
library(sna) #more tools for network analysis 
library(RColorBrewer) #provides color schemes for figures
library(ape) #tool for phylogenetic trees from DNA seq data
library(devtools) #developer tools (e.g., downloading packages from other places)
library(tidyverse) #loads core tidyverse packages (e.g., ggplot2, dplyr, tidyr, tibble, etc.)
library(ggtree) #more phylogenetic tree tools

2 Making Figures with Default Graphics and ggplot2

This section will address different ways to make figures in R. Specifically, we will cover:

  • Chunk options for figures
  • Plotting with R’s default graphics
  • Introduction to ‘ggplot2’
    • Familiarizing ourselves with ggplot’s grammar (Example 1)
    • Exploring a few methods for arranging ggplots (Example 2)
  • Exercises to create and arrange different types of figures using ‘ggplot2’ and associated packages

This is not an exhaustive tutorial for all of the features of ‘ggplot2’ or R’s default graphics, but instead serves as a base to get you started with visualizing your data in more meaningful ways using R!

2.1 Chunk Options for Figures

Gandrud outlines several useful chunk options for figures in R Markdown documents:

  • fig.path: This tells knitr where to save your figures by specifying a file path.
  • out.height and out.width: These set your figure’s output height and width. R Markdown uses pixels (ex. To set height to 200 pixels, use out.height=‘200px’). If you are using LaTeX, you can also specify the height or width using cm, in, or as a proportion of a page element (ex. To set width to 80% of the text width, use outwidth=‘0.8\textwidth’).
  • fig.align: This sets the alignment of your figure to left, center, or right.
  • fig.cap and fig.lb: These set your figure’s LaTeX caption and label.

2.2 Inserting Figures (Review from the Chapter 1)

In Chapter 1, we learned about how to knit figures from images (ex. if you made a figure in a different program or R file), so we will just briefly review that again here. We will use the include_graphics() function of the ‘knitr’ package to do this. This example shows how to knit some draft ArcGIS maps, we will also specify the fig.cap, fig.align, and out.width.

knitr::include_graphics("Sample_ArcGIS_Maps.PNG")
Example: Inserting ArcGIS Draft Maps

Example: Inserting ArcGIS Draft Maps

Note: Use ‘echo=FALSE’ in chunk options to hide your code chunk. The code chunk is included here to demonstrate some of the other chunk options specific to figures.

2.3 Refresher: Plots with R’s Default Graphics

Because we already have some experience with Base R’s plotting functions, this section serves as a little refresh on how to do some basic plotting before we get into “fancy” plotting with ‘ggplot2’. Every time you create a new R Markdown document, you probably have noticed the inclusion of the pressure plot code chunk. That is exactly how to plot in Base R!

#taking a look at what the pressure data is
summary(pressure)
##   temperature     pressure       
##  Min.   :  0   Min.   :  0.0002  
##  1st Qu.: 90   1st Qu.:  0.1800  
##  Median :180   Median :  8.8000  
##  Mean   :180   Mean   :124.3367  
##  3rd Qu.:270   3rd Qu.:126.5000  
##  Max.   :360   Max.   :806.0000
#plotting the pressure data
plot(pressure)

Let’s try this again with some very unscientific fake data on different dogs’ fetch abilities and “goodness” ratings. For now, we will focus on their fetch abilities. Let’s start by loading the data file and checking it out.

#load and view data
GoodDogs <- read_csv("GoodDogs.csv")
View(GoodDogs)

Now that we can see we have data on both fetch time and fetch distance, let’s make a scatter plot, assuming that time is dependent on the distance the ball is thrown (and not on how fast a dog is or its focus on the task at hand).

plot(GoodDogs$Fetch_Distance, GoodDogs$Fetch_Time)

Now that we have the basic scatter plot down, let’s clean up our axis labels and add a title. We can do this by specifying the xlab, ylab, and main arguments in the plot() function.

plot(GoodDogs$Fetch_Distance, GoodDogs$Fetch_Time,
     xlab = "Distance Ball is Thrown (ft)",
     ylab = "Fetch Time (s)",
     main = "Throw Distance vs Time while Playing Fetch")

2.4 Using ggplot2

2.4.1 What is ggplot2?

‘ggplot2’ allows for additional flexibility and customization of our figures. Using the “grammar of graphics”, ‘ggplot2’ is able to build every graph from the same base components:

  • a data set (e.g., your data)
  • a set of geoms (e.g., the points representing your data + how you want them to look)
  • a coordinate system (e.g., defining where the data points are)

The grammar for this package can feel a little unwieldy at first, so keeping this cheatsheet (link) handy might be helpful!

2.4.2 Example 1: Revisting Our GoodDogs Data

Let’s start to get to know ggplot better by revisiting our fetch data. First, we need to specify our data using ggplot(). Next, we specify that we want a scatter plot using geom_point().

#make a basic scatter plot
ggplot(GoodDogs, aes(x=GoodDogs$Fetch_Distance, y=GoodDogs$Fetch_Time)) +
  geom_point() 

Now let’s make our figure look a little bit nicer by cleaning up our axis labels.

#add axis labels and a title
ggplot(GoodDogs, aes(x=GoodDogs$Fetch_Distance, y=GoodDogs$Fetch_Time)) +
  geom_point() +
  xlab("Distance Ball is Thrown (ft)") + #rename the x axis label
  ylab("Fetch Time (s)") + #rename the y axis label
  ggtitle("Throw Distance vs Time while Playing Fetch") + #add a plot title
  theme(plot.title = element_text(hjust = 0.5))  #this centers our title

We also might want to know some more about our data, like which points are associated with the different dog breeds in our dataset. This will use our same code from above, but a color argument to the aes argument of ggplot.

#add color based on dog breed
ggplot(GoodDogs, aes(x=GoodDogs$Fetch_Distance, y=GoodDogs$Fetch_Time, color=GoodDogs$Breed))+   
  geom_point() +
  xlab("Distance Ball is Thrown (ft)") + 
  ylab("Fetch Time (s)") + 
  ggtitle("Throw Distance vs Time while Playing Fetch") + 
  theme(plot.title = element_text(hjust = 0.5)) 

Finally, let’s refine our legend and title styling a bit more. We can also streamline our labels code using the labs function. Let’s also change our plot theme to remove the gridlines and gray background.

ggplot(GoodDogs, aes(x=GoodDogs$Fetch_Distance, y=GoodDogs$Fetch_Time, color=GoodDogs$Breed))+   
  geom_point() +
  labs(color = "Dog Breeds", x="Distance Ball is Thrown (ft)", y="Fetch Time (s)") + #combine our label functions and add in an argument for the legend title
  ggtitle("Throw Distance vs Time while Playing Fetch") + 
  theme(plot.title = element_text(hjust = 0.5, face = "bold"), legend.title = element_text(face = "bold"), legend.title.align = 0.5, panel.background = element_blank(), panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black")) #specify that the title should be in bold (face), format the legend title (legend.title and legend.title.align), and change the background color and gridlines (panel arguments)

2.4.3 Example 2: Arranging ggplots

Now that we are feeling more comfortable with ggplot’s grammar, let’s explore some additional customization options using ‘ggplot2’.  

Let’s start by exploring our GoodDogs data using a point and click shiny app available through the ‘ggplotgui’ package. This is a really nice tool for getting familiar with the ‘ggplot2’ grammar and for very quickly visualizing our data. We will use this to create a very simple histogram to see the age distribution of the dogs in our dataset.

#ggplotgui::ggplot_shiny(GoodDogs) #this opens a shiny app to point and click make a ggplot; note: commented out for tutorial - Haley will demonstrate this for the class

We can copy the code that the ggplot GUI app uses to produce this histogram as a shortcut. I also copied the code for age histograms by breed and goodness rating boxplots by breed. Because the app is limited in features, we will work on customizing these figures more next in Exercise 1A. But first, let’s start by running this code chunk and see the base ggplots we have produced. Ggplots can be saved as objects, which is a helpful feature.

#copied from the ggplot GUI app, changed df to GoodDogs and saved as object 
DogAge <- ggplot(GoodDogs, aes(x = Age)) +
  geom_histogram(position = 'identity', alpha = 0.83, binwidth = 1) +
  labs(x = 'Age', y = 'Count') +
  ggtitle('Dog Ages') +
  theme_classic()
DogAge

#copied from the ggplot GUI app, changed df to GoodDogs and saved as object; note: this is an example of faceting
DogAgeBreed <- ggplot(GoodDogs, aes(x = Age)) +
  geom_histogram(position = 'identity', alpha = 0.8, binwidth = 1) +
  facet_grid( . ~ Breed ) +
  labs(x = 'Dog Age (years)', y = 'Count of Dogs') +
  ggtitle('Dog Breed Age Comparisons') +
  theme_classic()
DogAgeBreed

#copied ratings boxplot, changed df to GoodDogs and saved as object
DogRating <- ggplot(GoodDogs, aes(x = Breed, y = Rating)) +
  geom_boxplot(notch = FALSE) +
  labs(x = 'Dog Breed', y = 'Goodness Rating') +
  ggtitle('Dog Breed Goodness Ratings') +
  theme_classic()
DogRating

Before we get to our first Exercise to start practicing making and arranging ggplots, let’s first go over how to arrange multiple ggplots on a page. We can use the plots we just made as an example. Faceting, like we see in the DogAgeBreed plot, is one way to achieve this.  

Method 1: ‘gridExtra’ package and grid.arrange() function  

The grid.arrange() function of the ‘gridExtra’ package is another easy option for arranging multiple plots. We have already saved our ggplots as objects, which is the first step in arranging them.

grid.arrange(DogAgeBreed, DogRating, nrow=1) #nrow=1 is specifying we want them in a single row

But uh oh! That’s difficult to read. Let’s try arranging them in two rows.

grid.arrange(DogAgeBreed, DogRating, nrow=2) #trying two rows

You can also arrange the plots by specifying columns. Let’s add in our DogAge histogram to illustrate this.

#two rows, two columns
grid.arrange(DogAge, DogAgeBreed, DogRating, nrow=2, ncol=2) 

Method 2: ‘grid’ package  

The ‘grid’ package also allows for arranging plots and can be useful for more complicated arrangements. This can help use span over multiple columns or rows. Let’s see an example here.

# Move to a new page
grid.newpage()
# Create layout : nrow = 3, ncol = 2
pushViewport(viewport(layout = grid.layout(nrow = 3, ncol = 2)))
# A helper function to define a region on the layout
define_region <- function(row, col){
  viewport(layout.pos.row = row, layout.pos.col = col)
} 
# Arrange the plots
print(DogAgeBreed, vp = define_region(row = 1, col = 1:2))   # Span over two columns
print(DogAge, vp = define_region(row = 2, col = 1))
print(DogRating, vp = define_region(row = 3, col = 1:2))

Method 3: ‘ggpubr’ package and the ggarrange() function

ggarrange(DogAgeBreed,                  # First row with breed/age histogram
          ggarrange(DogAge, ncol = 2), # Second row with age histogram
          ggarrange(DogRating, ncol = 1), #Third row with rating box plots
          nrow = 3)                                   

2.5 Exercise 1A: Customizing the DogAgeBreed and DogRating ggplots

Because there is likely a range of ggplot experience in this group, think of these exercises like a choose your own adventure. If you want to play around with ‘ggplot2’ and do your own thing with the plots, feel free to do so! Otherwise, you can work to produce something like this:
Exercise 1A Sample Figures

Exercise 1A Sample Figures

Steps:

  1. Create a histogram of dog ages faceted and colored based on dog breed. This histogram should have no background, panels, or gridlines. Add a title and axis labels.
    • Save the histogram as an object once you are happy with its format.
  2. Create a boxplot (x = Breed, y = Rating) and color it based on dog breed. This boxplot should have no background, panels, or gridlines. Add a title and axis labels.
    • Save the boxplot as an object once you are happy with its format.
  3. Use ggarrange() or another method to format your two plots into a two row arrangement with a common legend.

If you get stuck, my solution is available in the .Rmd file (Chunk: Solution1A)!

2.6 Exercise 1B: Revisiting the Iris Data from Chapter 8!

We will use the iris dataset that we first saw in the Chapter 8 tutorial to work on some more complex arrangement and figure customization. We will use the steps below to produce this:
Exercise 1B Sample Figures

Exercise 1B Sample Figures

Our end result is by no means complete, but will give you some practice for making different plot types and getting used to using these different packages. Feel free to edit axis labels, titles, and more!  

Steps:

  1. Create the following plots using the iris dataset and ‘ggplot2’ (challenge: use the ‘ggpubr’ package to create the plots - see link below):
    • A box plot of the Sepal.Width data
    • A faceted scatter plot with trend lines of y = “Petal.Length” by x = “Petal.Width”
    • A jittered violin plot of Petal.Length by species
  2. Create an “all.sepals” ggplot object where x = “Sepal.Length”, y = “Sepal.Width”
  3. Use the “all.sepals” object to create a scatter and 2D density plot.
  4. Use the minimal theme for all plots.
  5. Arrange the plots from steps 1-3 in 3 rows and with a common legend for iris species. Try specifying column widths to help with crowding in the resulting arrangement.

If you get stuck, my solution is available in the .Rmd file (Chunk: Solution1B)!

2.7 Challenge/Next Steps

To explore ‘ggpubr’, ‘gridExtra’, ‘ggplot2’, and other packages more, check out this great tutorial from STHDA (link). This tutorial goes into even more ways to arrange and visualize plots to get your figures publication-ready!

3 Exercise 2: Visualizing Networks

This section will address different ways to modify network visualizations in R using ggnet2. Specifically, we will cover:

  • Creating a basic network
  • Introduction to ‘ggnet2’
    • Modifying node/edge size and color
    • Changing node placement, shape, and transparency
    • Labelling our nodes
  • Exercises to practice customizing our network visualizations using ‘ggnet2’ and associated packages

3.1 What is a network?

At its core, a network is simply a set of vertices connected by a set of edges. There are many kinds of networks, and network analyses can be used across disciplines. For instance, networks of scientific collaboration, a food web of marine animals, and American college football games are all covered in a paper on community detection in networks by Girvan and Newman (2002). Additionally, Buldyrev et al. (2010) study node failure in interdependent networks like power grids. Social networks such as links between television and film actors found on http://www.imdb.com/ and neural networks, like the completely mapped neural network of the C. elegans worm are also extensively studied (Watts and Strogatz, 1998).

3.2 Network terminology

In network analyses, “nodes” designate the vertices of a network whereas “edges” indicate the ties between nodes. The edges in networks can be directed, indicating an ordering of vertices (wherein switching the direction of hte edge would change the structure of the network), or the edges can be undirected, meaning the edges are simply connections between vertices where order does not matter.

  • The World Wide Web is an example of directed edges: hyperlinks connect one Web page to another, but not necessarily the other way around!
  • Co-authorship networks are eamples of undirected networks, wherein nodes are authors and are connected by an edge if they have written a publication together (direction does not matter!)

3.3 Visualizing a network diagram in R using ggnet2

To continue our understanding of ‘ggplot2’, we will be using the ggnet2 function, which offers a larger range of network visualization in a single function call. ggnet2 plots network objects as ‘ggplot2’ objects that can be styled using ‘ggplot2’ scales and themes. While the ggnet2 function uses a syntax that may be familiar to those who have worked with ‘ggplot2’, it is also designed to be easily understood by users who may not be as familiar with ‘ggplot2’ objects. Thus, while ggnet2 applies the “grammar of graphics” to network objects, the function itself works very much like the plotting functions of the ‘igraph’ and/or ‘network’ packages, in that a long series of arguments is used to control pretty much every aspect of the network visualization. ggnet2 is available through the ‘GGally’ package.

You may already have these packages loaded from Session 1, but as a reminder these are the packages we will need for this section of the tutorial:

3.4 Let’s create a basic network!

For the purpose of this tutorial, we are going to create an undirected basic network, with 10 nodes named “a, b, …, i, j” and a high likelihood of an edge to exist between them:

# random graph
net = rgraph(10, mode = "graph", tprob = 0.5)
net = network(net, directed = FALSE)

#vertex names
network.vertex.names(net) = letters[1:10]

#visualize the network
ggnet2(net)

The net argument is the only compulsory argument of ggnet2. It can be a network object or any object that can be coerced to that class through edgeset.constructors functions, such as adjacency matrixes, incidence matrixes and edge lists.

3.5 Let’s start to modify our network!

The most basic properties of our network that we might want to change are the size/color of the nodes and/or the size/color of the edges. Let’s take a look at how we modify these properties…

# editing node and edge properties
ggnet2(net, node.size = 6, node.color = "black", edge.size = 1, edge.color = "grey")

Note that the vertex-related arguments of ggnet2 start with node, and the edge-related arguments start with edge. We can also abbreviate the node.color and node.size arguments to save time!

ggnet2(net, size = 6, color = "black", edge.size = 1, edge.color = "grey")

Using these basic methods, we can set the color, size, shape, and even transparency of the nodes. Let’s practice!

3.5.1 Exercise 2A

Using the code chunks above as an example, modify your network so that it contains:

  • grey nodes
  • red edges
  • try different versions with small vs. big nodes and thin vs. thick edges

If you need help, a solution is available in the .Rmd file associated with this tutorial (Chunk solution2A)!

3.6 Node placement

In addition to the attributes modified above, we can also modify the POSITION of our nodes. By default, ggnet2 places nodes using something called the Fruchter-Reingold force-directed algorithm. However, there are other layouts that we might want to use instead based on our analysis. There is no single, “good” layout algorithm, and different approaches may be valuable under different circumstances. For more information, you can see the documentation of the gplot.layout function for the list of placement algorithms. Let’s test out a few different common algorithms. How do these networks differ from one another?

# algorithms of node placement
ggnet2(net) # default Fruchter-Reingold force-directed algorithm

ggnet2(net, mode = "circle") # positions nodes in a circle

ggnet2(net, mode = "kamadakawai") #uses the Kamada-Kawai force-directed algorithm

ggnet2(net, mode = "random") #randomly positions nodes 

As noted, the default is Fruchterman-Reingold. This function generates a layout using a variant of Frucherterman and Reingold’s force-directed placement algorithm. The circle algorithm places vertices uniformly in a circle and can’t be modified by any additional arguments. The kamadakawai function generates a vertex layout using a version of the Kamada-Kawai force-directed placement algorithm. As one might epect, the random function places vertices randomly - you can re-run this line of code repeatedly and see all the different node arrangements that are randomly generated!

3.6.1 Exercise 2B

Open up the help documentation for the gplot.layout function and look at the list of possible layouts. Choose one we haven’t looked at yet and edit your network code to reflect a different mode. How did your function alter the arrangement of the network from the default settings?

3.7 Advanced node colors

We have already considered how to do a basic modification of node colors. Let’s now assign a vertex attribute “phono”, which indicates whether the name of the vertex is a vowel or consonant. This attribute can be passed to ggnet2 to indicate that the nodes belong to a certain group. We will pass the name of the vertex attribute to the color argument, which will then use it to map the colors of the nodes.

# create vertex attribute to indicate vowel or consonant
net %v% "phono" = ifelse(letters[1:10] %in% c("a", "e", "i"), "vowel", "consonant")

# map node color based on vertex attribute
ggnet2(net, color = "phono")

By default, ggnet2 assigns a grayscale color to each group, but we can modify this behavior! There are different options to modify the color assignment. Let’s try out a few options! One method consists of “hard-coding” the colors into the graph by assigning them to a vertex attribute, and then passing this attribute to ggnet2:

# hard-coding the color assignments
net %v% "color" = ifelse(net %v% "phono" == "vowel", "steelblue", "tomato")
ggnet2(net, color = "color")

We could also create a named vector consisting of a color legend through the palette argument, or generate a color vector “on the fly” directly in the function call (a more condensed version of the first option):

#color legend as a named vector using the palette argument
ggnet2(net, color = "phono", palette = c("vowel" = "steelblue", "consonant" = "tomato"))

# generate color vector on the fly
ggnet2(net, color = ifelse(net %v% "phono" == "vowel", "steelblue", "tomato"))

Lastly, we can also use pre-defined color palettes using the ‘RColorBrewer’ package. Palette refers to the name of any ColorBrewer palette, so ggnet2 will use this argument to color the nodes. If it returns an error message, there may not be enough colors in the package to encompass all node types.

# using pre-defined color palettes
ggnet2(net, color = "phono", palette = "Set2")

3.8 Node sizes

Now let’s start to think about the size of our nodes! In network analyses, it is common to size the nodes based upon their centrality within the network or some other element of interst. Just like the color argument, the size argument of ggnet2 can take a single numeric value, a vector of values, or a vertex attribute:

# changing node size with a vertex attribute
ggnet2(net, size = "phono")

Just like how used palettes to change the color of the nodes, we can also use the argument size.palette to create nodes of different sizes that are more easy to distinguish visually:

# using size.palette
ggnet2(net, size = "phono", size.palette = c("vowel" = 10, "consonant" = 1))

We can also modify the nodes so that their size corresponds to their centrality, or number of connections, within the network. We can define two separate measures of degree centrality: indegree, which his the count of the number of ties directed to the node, and outdegree, which is the number of ties the node directs to others.

When ties are associated to some positive aspects such as friendship or collaboration, indegree is often interpreted as a form of popularity, and outdegree as gregariousness. ggnet2 also recognizes total (or Freeman) degree, which can also be thought of as “betweenness”, or a node acting as a bridge along the shortest path betwen two other nodes. In addition to “indegree”, “outdegree”, and “freeman”, ggnet also understands the argument “degree” which is equivalent to freeman.

# changing node size with freeman degrees
ggnet2(net, size = "degree")

3.8.1 Exercise 2C

Change your network to reflect either “indegree” or “outdegree.” Did it make a noticeable difference in your network visualization? Why or why not?

3.9 Changing node shapes and transparency

You may have already realized that circles are the default node shape for ggnet2, but they are not the only option! We can also modify the shape and transparency of the nodes in the same manner that we modified the color and size of the nodes, either through a single value, a vector of values, or a vertex attribute!

Note: the second example below will return a warning about a duplicated plotting parameter. This is an innocuous warning that is produced by mapping two characteristics of the nodes to the same vertex attribute. It cannot be avoided without modifying ggplot2.

# changing shape using a single value
ggnet2(net, color = "phono", shape = 15)

# changing shape using a vertex attribute
ggnet2(net, color = "phono", shape = "phono")

3.9.1 Exercise 2D: Challenge!

Recall that we used the palette argument to create a named vector consisting of a color legend; cqn we use the palette argument to assign specific shapes and/or transparencies to our nodes depending on if they are a consonant or vowel? Let’s try it out!

Hint: use the argument alpha.palette to modify transparency and shape.palette to change the shape of nodes.

If you need help, a solution is available in the .Rmd file (Chunk solution2D)

3.10 Beware of over-modifications

When it comes to making these customizations, it is important to consider what you are trying to communicate with your network. ggnet2 is pretty flexible with changing node shapes and transparency, which can make it easy to go overboard. Try and make the minimal amount of modifications that communicate what is important in your network - node shapes become difficult to distinguish if you use more than six different shapes in the plots, and transparencies may not be as easily distinguishable by the reader.

# example of overly modified node shapes
ggnet2(net, shape = sample(1:10))

#example of nodes of different transparencies 
ggnet2(net, alpha = "phono")

3.11 Labeling our nodes

Through the label argument, we can also use ggnet2 to label the nodes of a network using vertex names, another vertex attribute, or any other vector of labels:

# labeling using vertex names
ggnet2(net, label = TRUE)

# labeling using vertex attribute
ggnet2(net, label = "phono")

#labeling using vector of labels
ggnet2(net, label = 1:10)

We can also choose WHICH nodes we want to label. Recall that this network is based on a string of letters, so we can choose to label nodes based on if they are a consonant or a vowel:

# labeling only vowels using a vector of values
ggnet2(net, label = c("a", "e", "i"), color = "phono", label.color = "black")

ggnet2 automatically sets the size of the labels to be half that of the node size, but we can also control the size of the label using the label.size argument, their color using the label.color argument, and their level of transparency using the label.alpha argument:

# changing label size
ggnet2(net, size = 12, label = TRUE, label.size = 5)

#changing label color
ggnet2(net, size = 12, label = TRUE, color = "black", label.color = "white")

# changing label transparency
ggnet2(net, label = TRUE, label.alpha = 0.75)

3.11.1 Exercise 2E

Using the code above as an example, modify your network so that the labels are 1/3 as big as the node size, the nodes are tomato-colored with steel blue labels, and 50% transparency!

3.12 More fun with networks!

There are LOT more things you can do with networks than there is time to go over in this tutorial. Some examples of other ways we could have modified our network using ggnet2 include… - altering the node legends using the alpha.legend, color.legend, shape.legend, and size.legend arguments - changing the line type of the edges - adding arrows to our edges to indicate directionality - coloring edges based on the attributes of connected nodes - removing nodes based on missing values

Furthermore, ggnet2 is just ONE package we can use to visualize networks. Other common packages to visualize networks include igraph and networkD3.

  • igraph is useful for building a network diagram from adjacency matrix, edge list, literal list of connections, and more.
  • networkD3 allows users to build interactive network diagrams with R, including zoom, hover nodes, reorganize the layout. This package will provide features for dynamic data manipulation and visualization and allows users to become active participants in data visualization process by allowing users to explore data points, hierarchies among the data, filter data by groups, and more

4 Exercise 3: Visualizing phylogenetic trees in R

4.1 What is a phylogeny?

A phylogenetic tree is a diagram that represents evolutionary relationships among organisms. Phylogenetic trees are hypotheses, not definitive facts, based on collected information about a set of species, eg. morphology or DNA.

4.2 Formats used in this tutorial

  • FASTA: FASTA format is a text-based format for representing either nucleotide sequences or peptide sequences, in which base pairs or amino acids are represented using single-letter codes. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data.
  • NEWICK: Newick format is a commonly used way of representing tree topologies as text. Put simply, monophyletic clades are surrounded by parentheses and sister clades are separated by commas. For example, a simple tree could be written as (((A,B),C),(D,E)). Newick format also contains information about branch lengths (after colons) and node names (after closed parentheses).

4.2.1 Creating trees from DNA seqs using the package ape

#set your working directory to the "phylogenies" directory 
#setwd("~/EEB_Program/Courses/Reproducible_Science/Chapter_10/phylogenies")

#install the package ape if you have not already 
#install.packages("ape")
library(ape)

#read the aligned DNA sequences which are in FASTA format
dna <- read.dna(file = "vanilla_seqs_specs.fasta", format = "fasta")

#calculate pairwise genetic distances
#What is the defaultevolutionary model used to calcualte distances?
D <- dist.dna(dna)

#make a neighbor joining tree
#What class is this tree?
tree <- nj(D)
class(tree)
## [1] "phylo"
#plot the tree
plot(tree, cex = 0.6, main = "Unrooted NJ Tree")

#make a rooted tree
tree2 <- root(tree, outgroup = "MK201723.1 Vanilla mexicana")
plot(tree2, main = "Rooted NJ Tree", cex = .7)

4.2.2 Reading Newick files and visualizing more trees

#use the read.tree function to open newick files 
tree <- read.tree("Theobroma_tree.nwk")
#look at the structure and class
str(tree)
class(tree)

#check out the arguments that can be used to plot trees
?plot.phylo()

Plot the tree using different arguments.

# Example 1- plotting a phylogram
plot(tree, no.margin=TRUE, cex = .6 ,edge.width=2)

#Example 2- plotting a cladogram
plot(tree, type = "c")

#Example 3- plotting an unrooted tree
plot(tree, type = "u")

The trees in examples 1 and 2 look good. Example 3 does not look good. Using the arguments in plot.phylo(), how could this be improved?

4.2.3 Using ggtree to plot phylogenetic trees

ggtree is an R package that extends ggplot2 for visualizating and annotating phylogenetic trees with their covariates and other associated data. It is available from Bioconductor. Bioconductor is a project to provide tools for analyzing and annotating various kinds of genomic data.

You can also download the package from github.

devtools::install_github("YuLab-SMU/ggtree")
## Skipping install of 'ggtree' from a github remote, the SHA1 (523d4945) has not changed since last install.
##   Use `force = TRUE` to force installation
library(tidyverse)
library(ggtree)

Make a basic tree with ggtree.

tree <- read.tree("vanilla.nwk")
ggtree(tree)

Add a scale.

# add a scale
ggtree(tree) + geom_treescale()

# or add the entire scale to the x axis with theme_tree2()
ggtree(tree) + theme_tree2()

Remove branch lengths from the tree.

ggtree(tree, branch.length="none")

ggtree(tree, branch.length="none", color="blue", size=2, linetype=3)#

# create the basic plot
p <- ggtree(tree)

# add node points, tip points, tip labels, 
p + geom_nodepoint() +
   theme_tree2() +
  xlim(0,.02) +
  theme_tree()+
  geom_tiplab()+
  geom_cladelabel(node = 11, label = "A clade", offset = .005, align = TRUE, color = "red") + 
  geom_cladelabel(node = 20, label = "Another clade", offset = .0045, align = TRUE, color = "blue") 

6 References

[1] D. Watts and S. Strogatz. Collective dynamics of ’small-world’ networks. Nature, 393(6684):440–442, 1998. [p28]

[2] M. Girvan and M. E. J. Newman. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA, 99(12):7821–7826, 2002. [p28, 44]

[3] S. V. Buldyrev, R. Parshani, G. Paul, H. E. Stanley, and S. Havlin. Catastrophic cascade of failures in interdependent networks. Nature, 464(7291):1025–1028, 2010. [p28]

(APPENDIX) Appendix

7 Appendix 1

Citations of all R packages used to generate this report.

library("knitcitations")
cleanbib()
options("citation_format" = "pandoc")
read.bibtex(file = "packages.bib")

[1] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2020. <URL: https://www.R-project.org/>.

8 Appendix 2

Version information about R, the operating system (OS) and attached or R loaded packages. This appendix was generated using sessionInfo().

## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] knitcitations_1.0.10 ggtree_2.5.0.991     forcats_0.5.0       
##  [4] stringr_1.4.0        dplyr_1.0.2          purrr_0.3.4         
##  [7] tidyr_1.1.2          tibble_3.0.4         tidyverse_1.3.0     
## [10] devtools_2.3.2       usethis_1.6.3        ape_5.4-1           
## [13] RColorBrewer_1.1-2   sna_2.6              statnet.common_4.4.1
## [16] network_1.16.1       GGally_2.0.0         ggpubr_0.4.0        
## [19] gridExtra_2.3        ggplotgui_1.0.0      ggplot2_3.3.2       
## [22] readr_1.4.0          knitr_1.30          
## 
## loaded via a namespace (and not attached):
##   [1] colorspace_1.4-1    ggsignif_0.6.0      ellipsis_0.3.1     
##   [4] rio_0.5.16          rprojroot_1.3-2     fs_1.5.0           
##   [7] aplot_0.0.6         rstudioapi_0.11     farver_2.0.3       
##  [10] remotes_2.2.0       fansi_0.4.1         lubridate_1.7.9    
##  [13] RefManageR_1.2.12   xml2_1.3.2          pkgload_1.1.0      
##  [16] jsonlite_1.7.1      broom_0.7.2         dbplyr_1.4.4       
##  [19] shiny_1.5.0         BiocManager_1.30.10 compiler_4.0.2     
##  [22] httr_1.4.2          rvcheck_0.1.8       backports_1.2.0    
##  [25] assertthat_0.2.1    fastmap_1.0.1       lazyeval_0.2.2     
##  [28] cli_2.1.0           later_1.1.0.1       htmltools_0.5.0    
##  [31] prettyunits_1.1.1   tools_4.0.2         coda_0.19-4        
##  [34] gtable_0.3.0        glue_1.4.2          Rcpp_1.0.5         
##  [37] rle_0.9.2           carData_3.0-4       cellranger_1.1.0   
##  [40] vctrs_0.3.4         nlme_3.1-149        xfun_0.18          
##  [43] ps_1.4.0            openxlsx_4.2.2      testthat_3.0.0     
##  [46] rvest_0.3.6         mime_0.9            lifecycle_0.2.0    
##  [49] rstatix_0.6.0       scales_1.1.1        hms_0.5.3          
##  [52] promises_1.1.1      parallel_4.0.2      yaml_2.2.1         
##  [55] curl_4.3            memoise_1.1.0       reshape_0.8.8      
##  [58] stringi_1.5.3       highr_0.8           desc_1.2.0         
##  [61] tidytree_0.3.3      bibtex_0.4.2.3      pkgbuild_1.1.0     
##  [64] zip_2.1.1           rlang_0.4.8         pkgconfig_2.0.3    
##  [67] evaluate_0.14       lattice_0.20-41     labeling_0.4.2     
##  [70] treeio_1.15.0       patchwork_1.0.1     htmlwidgets_1.5.2  
##  [73] cowplot_1.1.0       processx_3.4.4      tidyselect_1.1.0   
##  [76] plyr_1.8.6          magrittr_1.5        R6_2.5.0           
##  [79] generics_0.1.0      DBI_1.1.0           pillar_1.4.6       
##  [82] haven_2.3.1         foreign_0.8-80      withr_2.3.0        
##  [85] abind_1.4-5         modelr_0.1.8        crayon_1.3.4       
##  [88] car_3.0-10          plotly_4.9.2.1      rmarkdown_2.4      
##  [91] readxl_1.3.1        data.table_1.13.0   blob_1.2.1         
##  [94] callr_3.5.1         reprex_0.3.0        digest_0.6.27      
##  [97] xtable_1.8-4        httpuv_1.5.4        munsell_0.5.0      
## [100] viridisLite_0.3.0   sessioninfo_1.1.1